representation theory
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- South America > Colombia (0.04)
- (4 more...)
Metric Transforms and Low Rank Representations of Kernels for Fast Attention
We introduce a new linear-algebraic tool based on group representation theory, and use it to address three key problems in machine learning.1. Past researchers have proposed fast attention algorithms for LLMs by approximating or replace softmax attention with other functions, such as low-degree polynomials. The key property of these functions is that, when applied entry-wise to the matrix $QK^{\top}$, the result is a low rank matrix when $Q$ and $K$ are $n \times d$ matrices and $n \gg d$. This suggests a natural question: what are all functions $f$ with this property? If other $f$ exist and are quickly computable, they can be used in place of softmax for fast subquadratic attention algorithms.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (5 more...)
On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach
Giapitzakis, George, Fountoulakis, Kimon, Nichani, Eshaan, Lee, Jason D.
Semiautomata form a rich class of sequence-processing algorithms with applications in natural language processing, robotics, computational biology, and data mining. We establish the first Statistical Query hardness result for semiautomata under the uniform distribution over input words and initial states. We show that Statistical Query hardness can be established when both the alphabet size and input length are polynomial in the number of states. Unlike the case of deterministic finite automata, where hardness typically arises through the hardness of the language they recognize (e.g., parity), our result is derived solely from the internal state-transition structure of semiautomata. Our analysis reduces the task of distinguishing the final states of two semiautomata to studying the behavior of a random walk on the group $S_{N} \times S_{N}$. By applying tools from Fourier analysis and the representation theory of the symmetric group, we obtain tight spectral gap bounds, demonstrating that after a polynomial number of steps in the number of states, distinct semiautomata become nearly uncorrelated, yielding the desired hardness result.
- North America > Canada (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Research Report (0.84)
- Overview (0.67)
A Representation Theory for Ranking Functions
This paper presents a representation theory for permutation-valued functions, which in their general form can also be called listwise ranking functions. Pointwise ranking functions assign a score to each object independently, without taking into account the other objects under consideration; whereas listwise loss functions evaluate the set of scores assigned to all objects as a whole. In many supervised learning to rank tasks, it might be of interest to use listwise ranking functions instead; in particular, the Bayes Optimal ranking functions might themselves be listwise, especially if the loss function is listwise. A key caveat to using listwise ranking functions has been the lack of an appropriate representation theory for such functions. We show that a natural symmetricity assumption that we call exchangeability allows us to explicitly characterize the set of such exchangeable listwise ranking functions. Our analysis draws from the theories of tensor analysis, functional analysis and De Finetti theorems. We also present experiments using a novel reranking method motivated by our representation theory.
Metric Transforms and Low Rank Representations of Kernels for Fast Attention
We introduce a new linear-algebraic tool based on group representation theory, and use it to address three key problems in machine learning.1. Past researchers have proposed fast attention algorithms for LLMs by approximating or replace softmax attention with other functions, such as low-degree polynomials. The key property of these functions is that, when applied entry-wise to the matrix QK {\top}, the result is a low rank matrix when Q and K are n \times d matrices and n \gg d . This suggests a natural question: what are all functions f with this property? If other f exist and are quickly computable, they can be used in place of softmax for fast subquadratic attention algorithms.
Neural Networks: According to the Principles of Grassmann Algebra
In this paper, we explore the algebra of quantum idempotents and the quantization of fermions which gives rise to a Hilbert space equal to the Grassmann algebra associated with the Lie algebra. Since idempotents carry representations of the algebra under consideration, they form algebraic varieties and smooth manifolds in the natural topology. In addition to the motivation of linking up mathematical physics with machine learning, it is also shown that by using idempotents and invariant subspace of the corresponding algebras, these representations encode and perhaps provide a probabilistic interpretation of reasoning and relational paths in geometrical terms.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
Mathematical Data Science
Douglas, Michael R., Lee, Kyu-Hwan
In this article we discuss an approach to doing this which one can call mathematical data science. In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations. Broadly speaking, the field of data science is concerned with assembling, curating and analyzing large datasets, and developing methods which enable its users to not just answer predetermined questions about the data but to explore it, make simple descriptions and pictures, and arrive at novel insights. This certainly sounds promising as a tool for mathematical discovery! Mathematical data science is not new and has historically led to very important results. A famous example is the work of Birch and Swinnerton-Dyer leading to their conjecture [BSD65], based on computer generation of elliptic curves and linear regression analysis of the resulting data. However, the field really started to take off with the deep learning revolution and with the easy access to ML models provided by platforms such as Py-Torch and TensorFlow, and built into computer algebra systems such as Mathematica, Magma and SageMath.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Connecticut > Tolland County > Storrs (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- (2 more...)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.34)